Methodology & general code

Throughout this notebook, we will follow the same approach. We begin with Data Loading, where we import and prepare our necessary datasets. We then move to Data Manipulation, where we process and transform our data into the appropriate format for analysis. Finally, in the Visualization section, we create plots to effectively communicate our findings.

First we load the needed packages:

# Package Loading
# Spatial data handling
library(sf) # Simple Features for spatial vector data
library(spData) # Spatial datasets
library(terra) # Raster data handling
library(exactextractr) # Precise spatial averaging
library(gdistance) # Compute minimum distances on raster surfaces

# Additional utilities
library(units) # Unit handling and conversion
library(lwgeom) # Lightweight geometry operations
library(gridExtra) # Arranging multiple plots
library(readxl) # Excel file reading

# Core data manipulation and visualization
library(tidyverse) # Data manipulation and visualization suite
library(ggplot2) # Advanced plotting (included in tidyverse)

Exercise 1: Climate change in the USA

Data Description

The analysis uses two main data sources:

  • SPEI (Standardized Precipitation Evapotranspiration Index) data:

    • Source: digital.csic.es
    • Format: NetCDF (.nc) file
    • Resolution: Monthly data
    • Variable: Drought index where negative values indicate dry conditions
  • US Geographic Data:

    • Source: spData package
    • Contains: State boundaries and regional classifications
    • Regions: Northeast, Midwest, South, and West

Data Loading

To analyze the SPEI (Standardized Precipitation Evapotranspiration Index) trends across US regions over the past 50 years, we need the NetCDF file (.nc) containing SPEI raster data and the US geographic data from spData package. The .nc file with data about the SPEI index has been downloaded from this website.

# Import SPEI index
spei_index <- rast("data/ex1/spei01.nc")

# Import US regions
us_states <- spData::us_states

Data Wrangling

We first retrieve all dates from the SPEI index, then select only dates between 1965-2015 (our 50-year analysis period, limited by data availability).

# Get dates from SPEI dataset
dates <- time(spei_index)

# Filter only for years between 1965 and 2015
valid_dates <- which(year(dates) >= 1965 & year(dates) <= 2015)

# Extract the subset of SPEI data for our time period of interest
spei_subset <- spei_index[[valid_dates]]

# Remove from 'dates' the unnecessary dates
dates <- dates[valid_dates]

Methodological Approaches

We implement and compare two methodological approaches. The zonal statistics step computes the average SPEI for each state, creating a wide-format panel where each row is a state and each column represents a specific date (month and year). We then transform this into a long-format panel, adding state and region identifiers along with separated year and month columns. This restructuring enables us to efficiently compute yearly means, first at the state level and then aggregated to the regional level. However, this aggregation creates a mean-of-means approach. Below we present a second approach in which we calculate the mean for each region directly without the route via the states. The separation of temporal components (year and month) facilitates these temporal aggregations while maintaining the geographic hierarchy (states nested within regions) for spatial analysis.

State-to-Region Aggregation (Mean-of-means)

This approach:

  • First calculates state-level means from SPEI grid cells
  • Then aggregates states to regions
  • Potential limitation: Gives equal weight to each state regardless of size
# Zonal statistics to compute the mean (each row is a country and each column is a date)
zonal_stats <- exact_extract(
  spei_subset,
  us_states,
  fun = "mean"
)
##   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   2%  |                                                                              |===                                                                   |   4%  |                                                                              |====                                                                  |   6%  |                                                                              |======                                                                |   8%  |                                                                              |=======                                                               |  10%  |                                                                              |=========                                                             |  12%  |                                                                              |==========                                                            |  14%  |                                                                              |===========                                                           |  16%  |                                                                              |=============                                                         |  18%  |                                                                              |==============                                                        |  20%  |                                                                              |================                                                      |  22%  |                                                                              |=================                                                     |  24%  |                                                                              |===================                                                   |  27%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  31%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  35%  |                                                                              |==========================                                            |  37%  |                                                                              |===========================                                           |  39%  |                                                                              |=============================                                         |  41%  |                                                                              |==============================                                        |  43%  |                                                                              |===============================                                       |  45%  |                                                                              |=================================                                     |  47%  |                                                                              |==================================                                    |  49%  |                                                                              |====================================                                  |  51%  |                                                                              |=====================================                                 |  53%  |                                                                              |=======================================                               |  55%  |                                                                              |========================================                              |  57%  |                                                                              |=========================================                             |  59%  |                                                                              |===========================================                           |  61%  |                                                                              |============================================                          |  63%  |                                                                              |==============================================                        |  65%  |                                                                              |===============================================                       |  67%  |                                                                              |=================================================                     |  69%  |                                                                              |==================================================                    |  71%  |                                                                              |===================================================                   |  73%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  78%  |                                                                              |========================================================              |  80%  |                                                                              |=========================================================             |  82%  |                                                                              |===========================================================           |  84%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  88%  |                                                                              |===============================================================       |  90%  |                                                                              |================================================================      |  92%  |                                                                              |==================================================================    |  94%  |                                                                              |===================================================================   |  96%  |                                                                              |===================================================================== |  98%  |                                                                              |======================================================================| 100%
# Convert the wide format panel to long format panel and add state/region identifiers
# This is necessary to compute the yearly mean per country and then the yearly mean per region
spei_long <- zonal_stats %>%
  as.data.frame() %>%
  # Convert from wide to long format
  pivot_longer(
    cols = everything(),
    names_to = "time_index",
    values_to = "spei"
  ) %>%
  # Add state and region identifiers, plus temporal information
  mutate(
    state = rep(us_states$NAME, each = length(dates)),
    region = rep(us_states$REGION, each = length(dates)),
    date = rep(dates, times = nrow(us_states)),
    year = year(date),
    month = month(date)
  ) %>%
  select(state, region, year, month, spei)

# Compute the yearly mean for each state
spei_state_annual <- spei_long %>%
  group_by(state, region, year) %>%
  summarise(
    mean_spei_state = mean(spei, na.rm = TRUE),
    .groups = "drop"
  )

# Compute the yearly mean for each region
spei_regional <- spei_state_annual %>%
  group_by(region, year) %>%
  summarise(
    mean_spei_region = mean(mean_spei_state, na.rm = TRUE),
    .groups = "drop"
  )

Plotting

After these manipulations, we can create a plot that shows the temporal trends of SPEI index for each region (colored lines) and the overall smoothed trend with confidence interval (blue line).

# Plotting
ggplot(spei_regional, aes(x = year, y = mean_spei_region, color = region)) +
  geom_line() +
  geom_smooth(
    data = spei_regional, aes(x = year, y = mean_spei_region),
    method = "loess", color = "blue", se = TRUE
  ) +
  scale_color_manual(values = c("red", "green", "turquoise", "purple")) +
  scale_x_continuous(
    breaks = seq(1970, 2010, by = 10),
    expand = c(0.02, 0.02)
  ) +
  scale_y_continuous(
    limits = c(-1, 1),
    breaks = seq(-1, 1, by = 1)
  ) +
  labs(
    title = "SPEI Index by US Region (1965-2015) - mean-of-means approach",
    x = "Year",
    y = "SPEI Index",
    color = "Region"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
  )

Direct Regional Calculation

Now we will use the direct mean approach by calculating means immediately on the region level across years. Key advantages:

  • Directly calculates regional means from SPEI grid cells
  • Maintains proper spatial weighting
  • More computationally efficient
  • Theoretically more sound for spatial averaging
# Zonal statistics directly by region (grouping states by region first)
regions_sf <- us_states %>%
  group_by(REGION) %>%
  summarise(geometry = st_union(geometry))

# Calculate SPEI means directly for regions
zonal_stats_regional <- exact_extract(
  spei_subset,
  regions_sf,
  fun = "mean"
)
##   |                                                                              |                                                                      |   0%  |                                                                              |==================                                                    |  25%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================================                  |  75%  |                                                                              |======================================================================| 100%
# Convert to long format with regional identifiers
spei_regional_direct <- zonal_stats_regional %>%
  as.data.frame() %>%
  mutate(region = regions_sf$REGION) %>%
  pivot_longer(
    cols = -region,
    names_to = "time_index",
    values_to = "spei"
  ) %>%
  mutate(
    date = rep(dates, times = nrow(regions_sf)),
    year = year(date)
  ) %>%
  group_by(region, year) %>%
  summarise(
    mean_spei_region = mean(spei, na.rm = TRUE),
    .groups = "drop"
  )

Plotting

Next we plot the results and though the differences to our previous approach are minimal. However, the direct mean approach is more efficient and should be more what we are looking for:

# Plotting
ggplot(spei_regional_direct, aes(x = year, y = mean_spei_region, color = region)) +
  geom_line() +
  geom_smooth(
    data = spei_regional_direct, aes(x = year, y = mean_spei_region),
    method = "loess", color = "blue", se = TRUE
  ) +
  scale_color_manual(values = c("red", "green", "turquoise", "purple")) +
  scale_x_continuous(
    breaks = seq(1970, 2010, by = 10),
    expand = c(0.02, 0.02)
  ) +
  scale_y_continuous(
    limits = c(-1, 1),
    breaks = seq(-1, 1, by = 1)
  ) +
  labs(
    title = "SPEI Index by US Region (1965-2015)",
    x = "Year",
    y = "SPEI Index",
    color = "Region"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
  )

Results Interpretation

The plots show:

  • SPEI trends for each region (colored lines)
  • Overall trend (blue smoothed line with confidence interval)
  • Time period: 1965-2015
  • Y-axis range: -1 to 1 (drought to wet conditions)

While both approaches yield similar results due to the linear nature of averaging, the direct regional calculation is methodologically preferred for spatial climate data analysis for several key reasons:

  1. Spatial Weighting: The mean-of-means approach gives equal weight to each state regardless of its size, potentially overrepresenting smaller states in regional averages. The direct approach properly weights each grid cell based on its actual area contribution to the region.
  2. Boundary Effects: State boundaries are political, not climatological. The mean-of-means approach artificially segments climate patterns along state lines before aggregation, while the direct approach preserves natural climate continuity across regions.
  3. Statistical Accuracy: Multiple averaging steps in the mean-of-means approach can introduce compounding biases. The direct calculation minimizes this by reducing the number of aggregation steps.
  4. Computational Efficiency: The direct approach requires fewer calculation steps and less memory, making it more efficient for large datasets.

While both methods yield similar results in this case (due to the linear nature of averaging and relatively uniform spatial distribution), the direct regional calculation better represents the true spatial characteristics of climate data.

In comparison the first graph to be replicated looked as follows:

Figure 1
Figure 1

Discussion of Plot Differences

After comparing both visualizations, the most likely reasons for the differences between our plot and the reference plot are:

  • Different SPEI Timescale: Our code specifically uses spei01.nc (1-month SPEI), while the original plot likely uses a longer timescale index such as SPEI-12 or SPEI-24. Longer timescales produce smoother lines as they capture longer-term drought conditions rather than monthly fluctuations.
  • Temporal Aggregation Method: Our code computes simple annual averages from monthly data, whereas the original plot might use a different aggregation approach (e.g., weighted seasonal averages or rolling averages) that reduces volatility.
  • Smoothing Parameters: The original plot appears to use more aggressive smoothing for both regional lines and the trend line. Our geom_smooth() with default LOESS parameters may use different span or degree settings than the original.
  • Data Pre-processing: The original may have applied additional data cleaning steps such as outlier removal or drought-specific thresholding that aren’t present in our implementation.
  • Spatial Aggregation Differences: The original plot might use different regional boundary definitions or apply population-weighted rather than area-weighted averaging.
  • Year Range Subsetting: Despite both claiming 1965-2015, the exact start/end dates might differ slightly, affecting trend visualization, especially at the endpoints.
  • Different SPEI Version: The SPEI calculation methodology itself has evolved over time, and different versions of the dataset might be used.

The most significant factor is likely the SPEI timescale difference, as this would fundamentally change the volatility patterns visible in the visualization.


Exercise 2: Transportation Centrality

Part A

Data Loading

To calculate the distances between the 10 top populated cities in Spain, and represent the roads within continental Spain. We used data from the Natural Earth website. Precisely, we loaded:

  • ne_10m_populated_places: for the most populated cities
  • ne_10m_roads: for the streets
# Load populated cities and roads
places <- st_read("data/ex2/ne_10m_populated_places")
## Reading layer `ne_10m_populated_places' from data source 
##   `C:\Users\mmpei\OneDrive\Uni\BSE\Trimester 2\Geospatial\bse_geospatial\hw3\data\ex2\ne_10m_populated_places' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 7342 features and 137 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -179.59 ymin: -90 xmax: 179.3833 ymax: 82.48332
## Geodetic CRS:  WGS 84
roads <- st_read("data/ex2/ne_10m_roads")
## Reading layer `ne_10m_roads' from data source 
##   `C:\Users\mmpei\OneDrive\Uni\BSE\Trimester 2\Geospatial\bse_geospatial\hw3\data\ex2\ne_10m_roads' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 56600 features and 31 fields
## Geometry type: MULTILINESTRING
## Dimension:     XY
## Bounding box:  xmin: -166.5325 ymin: -55.11212 xmax: 178.4191 ymax: 71.17768
## Geodetic CRS:  WGS 84

Data Wrangling

From the R package spData, we use the world dataset to extract the geometry of Spain, filtering only for continental Spain i.e., we exclude roads outside continental Spain which seems to not have happened in the initial graph were roads overflow frontiers. Then, from the dataset containing the most populated cities, we extract only the cities in Spain, sort them in decreasing order of population, and retain only the top 10. Finally, from these top 10 cities, we extract Madrid and Vigo, which will be used in the plot.

# Filter for continental Spain only
spain <- world %>%
  filter(name_long == "Spain") %>%
  st_cast("POLYGON") %>%
  slice_max(st_area(.))

# Get top 10 most populated Spanish cities
spain_top_10_cities <- places %>%
  filter(ADM0NAME == "Spain") %>%
  arrange(desc(POP_MAX)) %>%
  slice_head(n = 10)

# Filter Madrid and Vigo for the plot
madrid_vigo <- spain_top_10_cities %>%
  filter(NAME %in% c("Madrid", "Vigo"))

Methodological Approach

First, we transform the CRS from EPSG:4326 to EPSG:3035, as the former uses latitude and longitude, which would lead to incorrect distance calculations in kilometers. Next, we intersect the road network with Spain’s boundaries to restrict the analysis to roads that are actually within Spain. We then create an empty raster over Spain, set a high resolution, and define the CRS as EPSG:3035 for the same reason mentioned earlier. After that, we generate the friction surface by assigning a value of 1 to cells that contain roads and 100 to cells that do not. This ensures that areas without roads have higher friction, influencing the computation of the shortest path between cities. Subsequently, we compute the transition matrix and, finally, calculate and print the distance matrix, which includes all pairwise distances between the top 10 most populated Spanish cities.

# Transform to planar EPSG:3035 for better distance calculation in km
spain_planar <- st_transform(spain, 3035)
roads_planar <- st_transform(roads, 3035)
cities_planar <- st_transform(spain_top_10_cities, 3035)

# Intersect roads with Spain's boundaries
spain_roads <- st_intersection(roads_planar, spain_planar)

# Create and prepare a raster on the top of Spain
raster <- rast(ext(spain_planar), resolution = 5000, crs = "EPSG:3035")
roads_vect <- vect(spain_roads)
road_raster <- rasterize(roads_vect, raster, field = 1)

# Create friction surface: we assign 100 to cells without streets and 1 with cells
# with streets. In this way we increase the friction for areas without streets, and so
# streets will be chosen when computing the distances between places.
road_values <- values(road_raster)
road_values[is.na(road_values)] <- 100
road_values[!is.na(road_values)] <- 1
values(road_raster) <- road_values

# Create transition matrix for distance calculation
tr_matrix <- transition(raster(road_raster),
  transitionFunction = mean,
  directions = 8
)

# Calculate distances between all city pairs
n_cities <- nrow(cities_planar)
dist_matrix <- matrix(0, n_cities, n_cities)
rownames(dist_matrix) <- cities_planar$NAME
colnames(dist_matrix) <- cities_planar$NAME

for (i in 1:n_cities) {
  for (j in 1:n_cities) {
    if (i < j) { # Only calculate the upper triangle and mirror
      path <- shortestPath(tr_matrix,
        origin = as_Spatial(cities_planar[i, ]),
        goal = as_Spatial(cities_planar[j, ]),
        output = "SpatialLines"
      )
      path_sf <- st_as_sf(path)
      st_crs(path_sf) <- st_crs(cities_planar)
      # Calculate distance in kilometers
      dist <- as.numeric(st_length(path_sf)) / 1000
      dist_matrix[i, j] <- dist
      # Mirror the distance in lower triangle
      dist_matrix[j, i] <- dist
    }
  }
}

# Print the distances
print("Distances between cities (km):")
## [1] "Distances between cities (km):"
print(round(dist_matrix, 1))
##           Madrid Barcelona Seville Bilbao Valencia Zaragoza Málaga Murcia
## Madrid       0.0     693.3   403.8  397.3    354.0    356.4  543.2  407.6
## Barcelona  693.3       0.0  1026.9  549.4    331.1    345.2  864.7  469.5
## Seville    403.8    1026.9     0.0  809.4    708.3    710.7  174.7  600.5
## Bilbao     397.3     549.4   809.4    0.0    602.3    283.7  957.0  829.8
## Valencia   354.0     331.1   708.3  602.3      0.0    335.2  529.5  215.0
## Zaragoza   356.4     345.2   710.7  283.7    335.2      0.0  681.6  537.8
## Málaga     543.2     864.7   174.7  957.0    529.5    681.6    0.0  417.6
## Murcia     407.6     469.5   600.5  829.8    215.0    537.8  417.6    0.0
## Granada‎    491.6     757.8   277.4  901.3    422.6    634.2  106.9  310.7
## Vigo       517.8    1086.6   691.1  678.3    896.6    853.5  884.1  818.9
##           Granada‎   Vigo
## Madrid      491.6  517.8
## Barcelona   757.8 1086.6
## Seville     277.4  691.1
## Bilbao      901.3  678.3
## Valencia    422.6  896.6
## Zaragoza    634.2  853.5
## Málaga      106.9  884.1
## Murcia      310.7  818.9
## Granada‎       0.0  820.2
## Vigo        820.2    0.0

Plotting - Map of Spain with Roads and Cities

After these manipulations, we can plot continental Spain with its roads, highlighting Madrid and Vigo, as shown in the reference plot.

# Plotting
ggplot() +
  geom_sf(data = st_transform(spain, 4326), fill = "lightgray") +
  geom_sf(data = st_transform(spain_roads, 4326), color = "black", size = 0.3) +
  geom_sf(data = st_transform(spain_top_10_cities, 4326), size = 1, color = "black") +
  geom_sf(
    data = st_transform(madrid_vigo %>% filter(NAME == "Madrid"), 4326),
    aes(shape = "Madrid", color = "Madrid"), size = 3
  ) +
  geom_sf(
    data = st_transform(madrid_vigo %>% filter(NAME == "Vigo"), 4326),
    aes(shape = "Vigo", color = "Vigo"), size = 3
  ) +
  scale_color_manual(
    name = "Cities",
    values = c("Madrid" = "red", "Vigo" = "blue")
  ) +
  scale_shape_manual(
    name = "Cities",
    values = c("Madrid" = 16, "Vigo" = 17)
  ) +
  theme(
    # Make background white
    panel.background = element_rect(fill = "white")
  )

In comparison the first map looked as follows:

Map 1
Map 1

Part B - Distance Density Plots

Data Wrangling

Using the distance matrix calculated in Part A, we now assess the isolation of Madrid and Vigo by analyzing their bilateral distances to the other top 10 most populated cities in Spain. We begin by extracting the distances for Madrid and Vigo. Note, however, that for the average distance calculation, that we exclude the cities from the calculations with themselves to exclude the self-distance of 0.

# Extract city names from the top 10 cities in long format
df_dist <- as.data.frame(dist_matrix) %>%
  rownames_to_column(var = "origin") %>%
  # Set up long format of the matrix
  gather(key = "destination", value = "distance", -origin) %>%
  # Exclude all other cities from origins
  filter(origin %in% c("Madrid", "Vigo")) %>% 
  # Order first by origin, then by destination
  arrange(origin, destination)

# Now exclude the self-distances
df_dist_wo_self <- df_dist %>%
  filter(origin != destination)

# Extract distances from Madrid to other cities
madrid_distances <- df_dist_wo_self %>%
  filter(origin == "Madrid") %>%
  pull(distance)

# Extract distances from Vigo to other cities
vigo_distances <- df_dist_wo_self %>%
  filter(origin == "Vigo") %>%
  pull(distance)

Analysis

Isolation is determined by comparing the average distances from each city to the others, where a higher average distance indicates greater isolation. We compute summary statistics, and calculate the average distance for each city to make a quantitative comparison.

# Calculate summary statistics for isolation
madrid_summary <- summary(madrid_distances)
vigo_summary <- summary(vigo_distances)

# Print summary statistics to compare isolation
cat("\nSummary of distances from Madrid to other top 10 cities (km):\n")
## 
## Summary of distances from Madrid to other top 10 cities (km):
print(round(madrid_summary, 1))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   354.0   397.3   407.6   462.8   517.8   693.3
cat("\nSummary of distances from Vigo to other top 10 cities (km):\n")
## 
## Summary of distances from Vigo to other top 10 cities (km):
print(round(vigo_summary, 1))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   517.8   691.1   820.2   805.2   884.1  1086.6
# Calculate average distances to assess isolation
madrid_avg_distance <- mean(madrid_distances)
vigo_avg_distance <- mean(vigo_distances)

cat("\nAverage distance from Madrid to other cities:", round(madrid_avg_distance, 1), "km\n")
## 
## Average distance from Madrid to other cities: 462.8 km
cat("Average distance from Vigo to other cities:", round(vigo_avg_distance, 1), "km\n")
## Average distance from Vigo to other cities: 805.2 km
# Determine which city is more isolated (higher average distance indicates more isolation)
if (madrid_avg_distance > vigo_avg_distance) {
  cat("Madrid appears more isolated based on average distance.\n")
} else if (vigo_avg_distance > madrid_avg_distance) {
  cat("Vigo appears more isolated based on average distance.\n")
} else {
  cat("Madrid and Vigo have similar isolation based on average distance.\n")
}
## Vigo appears more isolated based on average distance.

Plotting - Distance Density Plot (excluding self-distances)

To further explore the isolation of Madrid and Vigo, we create smoothed density distributions of their distances to the other top 10 cities. This visualization helps us understand the spread and concentration of distances, where a distribution with a wider spread or higher density at greater distances suggests greater isolation. We use geom_density() to plot the distributions, with Madrid in red and Vigo in turquoise, matching the style of the reference plot. However, we exclude the self-distance of 0.

# Plot smoothed density distributions using geom_density()
density_plot <- ggplot(df_dist_wo_self, aes(x = distance, color = origin, fill = origin)) +
  geom_density(alpha = 0.3) +  # Add transparency for overlap
  scale_color_manual(values = c("Madrid" = "red", "Vigo" = "turquoise")) +
  scale_fill_manual(values = c("Madrid" = "red", "Vigo" = "turquoise")) +
  labs(
    title = "Smoothed Density Distributions of Distances",
    x = "Distance (km)",
    y = "Density",
    color = "City",
    fill = "City"
  ) +
  theme_minimal()

# Display the plot
print(density_plot)

In comparison the second reference graph looked as follows:

Map 2
Map 2

Plotting - Distance Density Plot (including self-distances)

Although we think it seems logical to exclude the self-distances of Madrid and Vigo from the distance calculations, the reference plot seems to include them, as displayed by the noticeable density at x=0 for both cities. Thus, in this section, we compute the same plot but with “self-distances” included.

# Plot smoothed density distributions using geom_density(), including self-distances
density_plot_all <- ggplot(df_dist, aes(x = distance, color = origin, fill = origin)) +
  geom_density(alpha = 0.3) +  # Add transparency for overlap
  scale_color_manual(values = c("Madrid" = "red", "Vigo" = "turquoise"), labels = c("Madrid", "Vigo")) +
  scale_fill_manual(values = c("Madrid" = "red", "Vigo" = "turquoise"), labels = c("Madrid", "Vigo")) +
  labs(
    x = "distance",
    y = "density",
    color = "origin",
    fill = "origin"
  ) +
  theme_minimal() +
  theme(
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8)
  )

# Display the plot
print(density_plot_all)

Discussion of Plot Differences: Self-Distances and Calculation Parameters

Self-Distances Methodological Considerations

The decision to include or exclude self-distances (city to itself) significantly impacts isolation analysis:

  • Excluding self-distances (preferred for isolation studies) focuses on inter-city relationships, avoids artificial density spikes, and aligns with network analysis conventions.

  • Including self-distances might be justified when measuring overall accessibility rather than isolation specifically, or when considering all possible origin-destination pairs in a transportation system.

Geospatial Parameters Affecting Results

The differences between our plot and the reference are likely due to our choice of calculation parameters:

  1. Raster Resolution (5000 units): Finer resolution would capture road networks more precisely but at computational cost.

  2. Friction Surface Values (1:100 ratio): This strongly influences path selection and could dramatically change distances if the reference used different values.

  3. Road Network Data: Differences in the Natural Earth road dataset completeness or classification.

  4. Transition Matrix Configuration: The 8-direction setting affects possible path trajectories.

  5. Continental Spain Definition: Different methods for isolating mainland Spain could affect which roads are included.

Density Visualization Differences

Additional technical factors in the visualization itself:

  1. Bandwidth Parameter: Different smoothing parameters in geom_density() would change curve shapes.

  2. Alpha Transparency and Color Settings: Affect visual appearance of overlapping areas.

  3. Kernel Function: Different mathematical functions for density estimation produce different curves.

Despite these differences, both plots support the same conclusion that Vigo is more isolated than Madrid, with Vigo’s distribution showing consistently greater distances to other population centers.